New versions of PageRank employing alternative Web document models

نویسندگان

  • Mike Thelwall
  • Liwen Vaughan
چکیده

We introduce several new versions of PageRank (the link based Web page ranking algorithm), based upon an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based upon directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based upon these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects’ rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using SiteRank for Decentralized Computation of Web Document Ranking

The PageRank algorithm demonstrates the significance of the computation of document ranking of general importance or authority in Web information retrieval. However, doing a PageRank computation for the whole Web graph is both time-consuming and costly. State of the art Web crawler based search engines also suffer from the latency in retrieving a complete Web graph for the computation of PageRa...

متن کامل

The Generalized Web Surfer

Different models have been proposed for improving the results of Web search by taking into account the link structure of the Web. The PageRank algorithm models the behavior of a random surfer alternating between random jumps to new pages and following out links with equal probability. We propose to improve on PageRank by using an intelligent surfer that combines link structure and content to de...

متن کامل

Recursive Attribute Factoring

Clustering, or factoring of a document collection attempts to “explain” each observed document in terms of one or a small number of inferred prototypes. Prior work demonstrated that when links exist between documents in the corpus (as is the case with a collection of web pages or scientific papers), building a joint model of document contents and connections produces a better model than that bu...

متن کامل

A Survey on PageRank Computing

This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing...

متن کامل

A Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification

In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Aslib Proceedings

دوره 56  شماره 

صفحات  -

تاریخ انتشار 2004